full transcript

From the Ted Talk by Christian Rudder: Inside OKCupid The math of online dating

Unscramble the Blue Letters

Hello, my name is Christian Rudder, and I was one of the fdurenos of OkCupid. It's now one of the bgsgiet dating sites in the United setats. Like most everyone at the site, I was a math major, As you may epecxt, we're known for the analytic approach we take to love. We call it our matching algorithm. Basically, OkCupid's matching algorithm helps us decide whether two poeple should go on a date. We built our entire bnseusis around it. Now, algorithm is a fancy word, and people like to drop it like it's this big thing. But really, an ahrtgoilm is just a systematic, step-by-step way to solve a peorlbm. It doesn't have to be facny at all. Here in this lesson, I'm going to explain how we arrived at our particular algorithm, so you can see how it's done. Now, why are aitomrhgls even itmnoarpt? Why does this lesson even exsit? Well, notice one very significant phrase I used above: they are a step-by-step way to solve a problem, and as you probably know, computers excel at step-by-step psroecses. A computer without an algorithm is bscalilay an expensive paperweight. And since computers are such a prsveavie part of everyday life, algorithms are everywhere. The math behind OkCupid's matching algorithm is surprisingly simple. It's just some addition, multiplication, a little bit of square roots. The tricky part in designing it was figuring out how to take something mysterious, human attraction, and break it into components that a computer can work with. The first thing we needed to match people up was data, something for the algorithm to work with. The best way to get data quickly from people is to just ask for it. So we decided that OkCupid should ask users questions, stuff like, "Do you want to have kids one day?" "How often do you brush your teeth?" "Do you like scary movies?" And big stuff like, "Do you believe in God?" Now, a lot of the questions are good for matching like with like, that is, when both people answer the same way. For example, two people who are both into scary movies are probably a better match than one person who is and one who isn't. But what about a question like, "Do you like to be the center of attention?" If both people in a relationship are saying yes to this, they're going to have massive problems. We realized this early on, and so we decided we needed a bit more data from each question. We had to ask people to specify not only their own answer, but the answer they wntead from someone else. That wkoerd really well. But we needed one more dimension. Some questions tell you more about a person than others. For example, a question about politics, something like, "Which is worse: book bruning or flag burning?" might reveal more about someone than their tatse in mveios. And it doesn't make sense to weigh all things elaquly, so we added one fianl data point. For everything that OkCupid asks you, you have a chance to tell us the role it plays in your life. And this ranges from irrelevant to mandatory. So now, for every question, we have three things for our algorithm: first, your answer; second, how you want someone else — your ptotianel mtcah — to answer; and third, how important the question is to you at all. With all this information, OkCupid can figure out how well two people will get along. The algorithm crunches the numbers and gives us a result. As a practical example, let's look at how we'd match you with another person. Let's call him "B." Your match percentage with B is besad on questions you've both answered. Let's call that set of common questions "s." As a very simple example, we use a small set "s" with just two questions in common, and compute a match from that. Here are our two example questions. The first one, let's say, is, "How messy are you?" And the answer possibilities are: very messy, average and very organized. And let's say you awreesnd "very organized," and you'd like someone else to answer "very organized," and the qitosuen is very important to you. Basically, you're a neat freak. You're neat, you want someone else to be neat, and that's it. And let's say B is a little bit different. He answered "very organized" for himself, but "average" is OK with him as an answer from someone else, and the question is only a little important to him. Let's look at the second question, from our previous example: "Do you like to be the center of attention?" The answers are "yes" and "no." You've answered "no," you want someone else to answer "no," and the question is only a little important to you. Now B, he's answered "yes." He wants someone else to answer "no," because he wants the spotlight on him, and the question is somewhat important to him. So, let's try to compute all of this. Our first step is, since we use computers to do this, we need to assign numerical values to iades like "somewhat important" and "very important," because computers need everything in numbers. We at okuipcd decided on the following scale: "Irrelevant" is worth 0. "A little important" is worth 1. "Somewhat important" is worth 10. "Very important" is 50. And "absolutely mandatory" is 250. Next, the algorithm makes two simple calculations. The first is: How much did B's answers satisfy you? That is, how many possible points did B score on your scale? Well, you indicated that B's answer to the first question, about messiness, was very important to you. It's wtroh 50 points and B got that right. The second question is worth only 1, because you said it was only a little important. B got that wnrog, so B's answers were 50 out of 51 possible points. That's 98% stsictforaay. Pretty good. The second question the algorithm looks at is: How much did you sitafsy B? Well, B placed 1 point on your answer to the messiness question and 10 on your answer to the second. Of those 11, that's 1 plus 10, you earned 10 — you guys satisfied each other on the second question. So your answers were 10 out of 11 equals 91 pnecert satisfactory to B. That's not bad. The final step is to take these two match pgrneteceas and get one number for the both of you. To do this, the algorithm multiplies your scores, then takes the nth root, where "n" is the number of questions. Because s, which is the number of questions in this smalpe, is only 2, we have: match percentage eqlaus the square root of 98 percent tiems 91 percent. That equals 94 percent. That 94 percent is your match percentage with B. It's a mathematical expression of how happy you'd be with each other, based on what we know. Now, why does the algorithm mlplituy, as opposed to, say, average the two match scores together, and do the square-root business? In general, this formula is called the geometric mean. It's a great way to combine values that have wide ranges and represent very different peretoiprs. In other wdors, it's perfect for romantic miachntg. You've got wide ranges and you've got tons of different data points, like I said, about movies, pltiicos, religion — everything. Intuitively, too, this makes sense. Two people sfatisynig each other 50 percent should be a better match than two others who satisfy 0 and 100, because affection needs to be mutual. After adding a little correction for margin of error, in the case where we have a small nmuebr of qinutesos, like we do in this example, we're good to go. Any time OkCupid matches two people, it goes through the stpes we just outlined. First it collects data about your answers, then it compares your choices and preferences to other people's in smpile, mathematical ways. This, the ability to take real-world pohennema and make them something a microchip can understand, is, I think, the most important skill anyone can have these days. Like you use sentences to tell a story to a psoren, you use algorithms to tell a stroy to a ctempour. If you learn the language, you can go out and tell your stories. I hope this will help you do that.

Open Cloze

Hello, my name is Christian Rudder, and I was one of the ________ of OkCupid. It's now one of the _______ dating sites in the United ______. Like most everyone at the site, I was a math major, As you may ______, we're known for the analytic approach we take to love. We call it our matching algorithm. Basically, OkCupid's matching algorithm helps us decide whether two ______ should go on a date. We built our entire ________ around it. Now, algorithm is a fancy word, and people like to drop it like it's this big thing. But really, an _________ is just a systematic, step-by-step way to solve a _______. It doesn't have to be _____ at all. Here in this lesson, I'm going to explain how we arrived at our particular algorithm, so you can see how it's done. Now, why are __________ even _________? Why does this lesson even _____? Well, notice one very significant phrase I used above: they are a step-by-step way to solve a problem, and as you probably know, computers excel at step-by-step _________. A computer without an algorithm is _________ an expensive paperweight. And since computers are such a _________ part of everyday life, algorithms are everywhere. The math behind OkCupid's matching algorithm is surprisingly simple. It's just some addition, multiplication, a little bit of square roots. The tricky part in designing it was figuring out how to take something mysterious, human attraction, and break it into components that a computer can work with. The first thing we needed to match people up was data, something for the algorithm to work with. The best way to get data quickly from people is to just ask for it. So we decided that OkCupid should ask users questions, stuff like, "Do you want to have kids one day?" "How often do you brush your teeth?" "Do you like scary movies?" And big stuff like, "Do you believe in God?" Now, a lot of the questions are good for matching like with like, that is, when both people answer the same way. For example, two people who are both into scary movies are probably a better match than one person who is and one who isn't. But what about a question like, "Do you like to be the center of attention?" If both people in a relationship are saying yes to this, they're going to have massive problems. We realized this early on, and so we decided we needed a bit more data from each question. We had to ask people to specify not only their own answer, but the answer they ______ from someone else. That ______ really well. But we needed one more dimension. Some questions tell you more about a person than others. For example, a question about politics, something like, "Which is worse: book _______ or flag burning?" might reveal more about someone than their _____ in ______. And it doesn't make sense to weigh all things _______, so we added one _____ data point. For everything that OkCupid asks you, you have a chance to tell us the role it plays in your life. And this ranges from irrelevant to mandatory. So now, for every question, we have three things for our algorithm: first, your answer; second, how you want someone else — your _________ _____ — to answer; and third, how important the question is to you at all. With all this information, OkCupid can figure out how well two people will get along. The algorithm crunches the numbers and gives us a result. As a practical example, let's look at how we'd match you with another person. Let's call him "B." Your match percentage with B is _____ on questions you've both answered. Let's call that set of common questions "s." As a very simple example, we use a small set "s" with just two questions in common, and compute a match from that. Here are our two example questions. The first one, let's say, is, "How messy are you?" And the answer possibilities are: very messy, average and very organized. And let's say you ________ "very organized," and you'd like someone else to answer "very organized," and the ________ is very important to you. Basically, you're a neat freak. You're neat, you want someone else to be neat, and that's it. And let's say B is a little bit different. He answered "very organized" for himself, but "average" is OK with him as an answer from someone else, and the question is only a little important to him. Let's look at the second question, from our previous example: "Do you like to be the center of attention?" The answers are "yes" and "no." You've answered "no," you want someone else to answer "no," and the question is only a little important to you. Now B, he's answered "yes." He wants someone else to answer "no," because he wants the spotlight on him, and the question is somewhat important to him. So, let's try to compute all of this. Our first step is, since we use computers to do this, we need to assign numerical values to _____ like "somewhat important" and "very important," because computers need everything in numbers. We at _______ decided on the following scale: "Irrelevant" is worth 0. "A little important" is worth 1. "Somewhat important" is worth 10. "Very important" is 50. And "absolutely mandatory" is 250. Next, the algorithm makes two simple calculations. The first is: How much did B's answers satisfy you? That is, how many possible points did B score on your scale? Well, you indicated that B's answer to the first question, about messiness, was very important to you. It's _____ 50 points and B got that right. The second question is worth only 1, because you said it was only a little important. B got that _____, so B's answers were 50 out of 51 possible points. That's 98% ____________. Pretty good. The second question the algorithm looks at is: How much did you _______ B? Well, B placed 1 point on your answer to the messiness question and 10 on your answer to the second. Of those 11, that's 1 plus 10, you earned 10 — you guys satisfied each other on the second question. So your answers were 10 out of 11 equals 91 _______ satisfactory to B. That's not bad. The final step is to take these two match ___________ and get one number for the both of you. To do this, the algorithm multiplies your scores, then takes the nth root, where "n" is the number of questions. Because s, which is the number of questions in this ______, is only 2, we have: match percentage ______ the square root of 98 percent _____ 91 percent. That equals 94 percent. That 94 percent is your match percentage with B. It's a mathematical expression of how happy you'd be with each other, based on what we know. Now, why does the algorithm ________, as opposed to, say, average the two match scores together, and do the square-root business? In general, this formula is called the geometric mean. It's a great way to combine values that have wide ranges and represent very different __________. In other _____, it's perfect for romantic ________. You've got wide ranges and you've got tons of different data points, like I said, about movies, ________, religion — everything. Intuitively, too, this makes sense. Two people __________ each other 50 percent should be a better match than two others who satisfy 0 and 100, because affection needs to be mutual. After adding a little correction for margin of error, in the case where we have a small ______ of _________, like we do in this example, we're good to go. Any time OkCupid matches two people, it goes through the _____ we just outlined. First it collects data about your answers, then it compares your choices and preferences to other people's in ______, mathematical ways. This, the ability to take real-world _________ and make them something a microchip can understand, is, I think, the most important skill anyone can have these days. Like you use sentences to tell a story to a ______, you use algorithms to tell a _____ to a ________. If you learn the language, you can go out and tell your stories. I hope this will help you do that.

Solution

  1. potential
  2. business
  3. burning
  4. matching
  5. people
  6. questions
  7. percent
  8. properties
  9. processes
  10. okcupid
  11. ideas
  12. problem
  13. match
  14. question
  15. equals
  16. story
  17. states
  18. satisfy
  19. simple
  20. words
  21. number
  22. worked
  23. wrong
  24. pervasive
  25. sample
  26. expect
  27. based
  28. times
  29. basically
  30. movies
  31. percentages
  32. phenomena
  33. fancy
  34. biggest
  35. steps
  36. answered
  37. founders
  38. important
  39. equally
  40. politics
  41. taste
  42. algorithm
  43. exist
  44. algorithms
  45. person
  46. wanted
  47. multiply
  48. final
  49. computer
  50. satisfying
  51. worth
  52. satisfactory

Original Text

Hello, my name is Christian Rudder, and I was one of the founders of OkCupid. It's now one of the biggest dating sites in the United States. Like most everyone at the site, I was a math major, As you may expect, we're known for the analytic approach we take to love. We call it our matching algorithm. Basically, OkCupid's matching algorithm helps us decide whether two people should go on a date. We built our entire business around it. Now, algorithm is a fancy word, and people like to drop it like it's this big thing. But really, an algorithm is just a systematic, step-by-step way to solve a problem. It doesn't have to be fancy at all. Here in this lesson, I'm going to explain how we arrived at our particular algorithm, so you can see how it's done. Now, why are algorithms even important? Why does this lesson even exist? Well, notice one very significant phrase I used above: they are a step-by-step way to solve a problem, and as you probably know, computers excel at step-by-step processes. A computer without an algorithm is basically an expensive paperweight. And since computers are such a pervasive part of everyday life, algorithms are everywhere. The math behind OkCupid's matching algorithm is surprisingly simple. It's just some addition, multiplication, a little bit of square roots. The tricky part in designing it was figuring out how to take something mysterious, human attraction, and break it into components that a computer can work with. The first thing we needed to match people up was data, something for the algorithm to work with. The best way to get data quickly from people is to just ask for it. So we decided that OkCupid should ask users questions, stuff like, "Do you want to have kids one day?" "How often do you brush your teeth?" "Do you like scary movies?" And big stuff like, "Do you believe in God?" Now, a lot of the questions are good for matching like with like, that is, when both people answer the same way. For example, two people who are both into scary movies are probably a better match than one person who is and one who isn't. But what about a question like, "Do you like to be the center of attention?" If both people in a relationship are saying yes to this, they're going to have massive problems. We realized this early on, and so we decided we needed a bit more data from each question. We had to ask people to specify not only their own answer, but the answer they wanted from someone else. That worked really well. But we needed one more dimension. Some questions tell you more about a person than others. For example, a question about politics, something like, "Which is worse: book burning or flag burning?" might reveal more about someone than their taste in movies. And it doesn't make sense to weigh all things equally, so we added one final data point. For everything that OkCupid asks you, you have a chance to tell us the role it plays in your life. And this ranges from irrelevant to mandatory. So now, for every question, we have three things for our algorithm: first, your answer; second, how you want someone else — your potential match — to answer; and third, how important the question is to you at all. With all this information, OkCupid can figure out how well two people will get along. The algorithm crunches the numbers and gives us a result. As a practical example, let's look at how we'd match you with another person. Let's call him "B." Your match percentage with B is based on questions you've both answered. Let's call that set of common questions "s." As a very simple example, we use a small set "s" with just two questions in common, and compute a match from that. Here are our two example questions. The first one, let's say, is, "How messy are you?" And the answer possibilities are: very messy, average and very organized. And let's say you answered "very organized," and you'd like someone else to answer "very organized," and the question is very important to you. Basically, you're a neat freak. You're neat, you want someone else to be neat, and that's it. And let's say B is a little bit different. He answered "very organized" for himself, but "average" is OK with him as an answer from someone else, and the question is only a little important to him. Let's look at the second question, from our previous example: "Do you like to be the center of attention?" The answers are "yes" and "no." You've answered "no," you want someone else to answer "no," and the question is only a little important to you. Now B, he's answered "yes." He wants someone else to answer "no," because he wants the spotlight on him, and the question is somewhat important to him. So, let's try to compute all of this. Our first step is, since we use computers to do this, we need to assign numerical values to ideas like "somewhat important" and "very important," because computers need everything in numbers. We at OkCupid decided on the following scale: "Irrelevant" is worth 0. "A little important" is worth 1. "Somewhat important" is worth 10. "Very important" is 50. And "absolutely mandatory" is 250. Next, the algorithm makes two simple calculations. The first is: How much did B's answers satisfy you? That is, how many possible points did B score on your scale? Well, you indicated that B's answer to the first question, about messiness, was very important to you. It's worth 50 points and B got that right. The second question is worth only 1, because you said it was only a little important. B got that wrong, so B's answers were 50 out of 51 possible points. That's 98% satisfactory. Pretty good. The second question the algorithm looks at is: How much did you satisfy B? Well, B placed 1 point on your answer to the messiness question and 10 on your answer to the second. Of those 11, that's 1 plus 10, you earned 10 — you guys satisfied each other on the second question. So your answers were 10 out of 11 equals 91 percent satisfactory to B. That's not bad. The final step is to take these two match percentages and get one number for the both of you. To do this, the algorithm multiplies your scores, then takes the nth root, where "n" is the number of questions. Because s, which is the number of questions in this sample, is only 2, we have: match percentage equals the square root of 98 percent times 91 percent. That equals 94 percent. That 94 percent is your match percentage with B. It's a mathematical expression of how happy you'd be with each other, based on what we know. Now, why does the algorithm multiply, as opposed to, say, average the two match scores together, and do the square-root business? In general, this formula is called the geometric mean. It's a great way to combine values that have wide ranges and represent very different properties. In other words, it's perfect for romantic matching. You've got wide ranges and you've got tons of different data points, like I said, about movies, politics, religion — everything. Intuitively, too, this makes sense. Two people satisfying each other 50 percent should be a better match than two others who satisfy 0 and 100, because affection needs to be mutual. After adding a little correction for margin of error, in the case where we have a small number of questions, like we do in this example, we're good to go. Any time OkCupid matches two people, it goes through the steps we just outlined. First it collects data about your answers, then it compares your choices and preferences to other people's in simple, mathematical ways. This, the ability to take real-world phenomena and make them something a microchip can understand, is, I think, the most important skill anyone can have these days. Like you use sentences to tell a story to a person, you use algorithms to tell a story to a computer. If you learn the language, you can go out and tell your stories. I hope this will help you do that.

Frequently Occurring Word Combinations

ngrams of length 2

collocation frequency
matching algorithm 3
match percentage 3
wide ranges 2

Important Words

  1. ability
  2. added
  3. adding
  4. addition
  5. affection
  6. algorithm
  7. algorithms
  8. analytic
  9. answer
  10. answered
  11. answers
  12. approach
  13. arrived
  14. asks
  15. assign
  16. attention
  17. attraction
  18. average
  19. bad
  20. based
  21. basically
  22. big
  23. biggest
  24. bit
  25. book
  26. break
  27. brush
  28. built
  29. burning
  30. business
  31. calculations
  32. call
  33. called
  34. case
  35. center
  36. chance
  37. choices
  38. christian
  39. collects
  40. combine
  41. common
  42. compares
  43. components
  44. compute
  45. computer
  46. computers
  47. correction
  48. crunches
  49. data
  50. date
  51. dating
  52. day
  53. days
  54. decide
  55. decided
  56. designing
  57. dimension
  58. drop
  59. early
  60. earned
  61. entire
  62. equally
  63. equals
  64. error
  65. everyday
  66. excel
  67. exist
  68. expect
  69. expensive
  70. explain
  71. expression
  72. fancy
  73. figure
  74. figuring
  75. final
  76. flag
  77. formula
  78. founders
  79. freak
  80. general
  81. geometric
  82. god
  83. good
  84. great
  85. guys
  86. happy
  87. helps
  88. hope
  89. human
  90. ideas
  91. important
  92. information
  93. intuitively
  94. irrelevant
  95. kids
  96. language
  97. learn
  98. lesson
  99. life
  100. lot
  101. love
  102. major
  103. mandatory
  104. margin
  105. massive
  106. match
  107. matches
  108. matching
  109. math
  110. mathematical
  111. messiness
  112. messy
  113. microchip
  114. movies
  115. multiplication
  116. multiplies
  117. multiply
  118. mutual
  119. mysterious
  120. neat
  121. needed
  122. notice
  123. nth
  124. number
  125. numbers
  126. numerical
  127. okcupid
  128. opposed
  129. organized
  130. outlined
  131. paperweight
  132. part
  133. people
  134. percent
  135. percentage
  136. percentages
  137. perfect
  138. person
  139. pervasive
  140. phenomena
  141. phrase
  142. plays
  143. point
  144. points
  145. politics
  146. possibilities
  147. potential
  148. practical
  149. preferences
  150. pretty
  151. previous
  152. problem
  153. problems
  154. processes
  155. properties
  156. question
  157. questions
  158. quickly
  159. ranges
  160. realized
  161. relationship
  162. religion
  163. represent
  164. result
  165. reveal
  166. role
  167. romantic
  168. root
  169. roots
  170. rudder
  171. sample
  172. satisfactory
  173. satisfied
  174. satisfy
  175. satisfying
  176. scale
  177. scary
  178. score
  179. scores
  180. sense
  181. sentences
  182. set
  183. significant
  184. simple
  185. site
  186. sites
  187. skill
  188. small
  189. solve
  190. spotlight
  191. square
  192. states
  193. step
  194. steps
  195. stories
  196. story
  197. stuff
  198. surprisingly
  199. systematic
  200. takes
  201. taste
  202. teeth
  203. time
  204. times
  205. tons
  206. tricky
  207. understand
  208. united
  209. users
  210. values
  211. wanted
  212. ways
  213. weigh
  214. wide
  215. word
  216. words
  217. work
  218. worked
  219. worth
  220. wrong